{ "cells": [ { "cell_type": "markdown", "id": "9cc3b252-0afc-404f-a7c2-706d0a7e3c89", "metadata": { "editable": true, "id": "9cc3b252-0afc-404f-a7c2-706d0a7e3c89", "slideshow": { "slide_type": "" }, "tags": [ "hide-input" ] }, "source": [ "### Asking scientific questions of models - Exercises & Answers" ] }, { "cell_type": "markdown", "id": "a734dcce-afbf-409a-8c38-ce4a535f4ea5", "metadata": { "id": "a734dcce-afbf-409a-8c38-ce4a535f4ea5" }, "source": [ "The exercises here are designed to get you comfortable using models to make predictions and having them answer questions of interest, as opposed to relying on a suite of tests picked from a flowchart." ] }, { "cell_type": "markdown", "id": "0cde8377-c11f-4474-95f2-7e6d353ecba1", "metadata": { "id": "0cde8377-c11f-4474-95f2-7e6d353ecba1" }, "source": [ "## Traditional approaches from a model-based perspective\n", "To get things clear, lets do some standard approaches such as t-tests and ANOVAs from the perspective of a linear model. We won't interpret the coefficients, we'll just get the model to tell us the answer directly and compare it to the traditional answer.\n", "\n", "### a. Imports\n", "Import `pandas`, `pingouin`, `statsmodels.formula.api`, `seaborn`, and also `marginaleffects`." ] }, { "cell_type": "code", "execution_count": 1, "id": "83f7ae21-b1f1-4d9b-93b2-720c03bbe75b", "metadata": { "editable": true, "executionInfo": { "elapsed": 11, "status": "ok", "timestamp": 1723926881418, "user": { "displayName": "Alex Jones", "userId": "11094282981700434339" }, "user_tz": -60 }, "id": "83f7ae21-b1f1-4d9b-93b2-720c03bbe75b", "slideshow": { "slide_type": "" }, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# Your answer here\n", "import pandas as pd\n", "import pingouin as pg\n", "import statsmodels.formula.api as smf\n", "import seaborn as sns\n", "import marginaleffects as me\n", "\n", "sns.set_style('whitegrid')" ] }, { "cell_type": "markdown", "id": "9dc75c2e-25e4-4ed9-8f57-b60a682e038e", "metadata": { "id": "9dc75c2e-25e4-4ed9-8f57-b60a682e038e" }, "source": [ "### b. Loading up data\n", "We will continue our exploration of the 'Teaching Ratings' dataset here, and use `marginaleffects` to explore the consequences of our models.\n", "\n", "The data can be found here: https://vincentarelbundock.github.io/Rdatasets/csv/AER/TeachingRatings.csv\n", "\n", "Read it into a dataframe called `profs`, and show the top 5 rows." ] }, { "cell_type": "code", "execution_count": 2, "id": "a87dbb65-80a5-4a29-a8b6-e26838ab5792", "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 224 }, "editable": true, "executionInfo": { "elapsed": 10, "status": "ok", "timestamp": 1723926881418, "user": { "displayName": "Alex Jones", "userId": "11094282981700434339" }, "user_tz": -60 }, "id": "a87dbb65-80a5-4a29-a8b6-e26838ab5792", "outputId": "efe2f990-4fea-415e-f736-d6ae07fc0649", "slideshow": { "slide_type": "" }, "tags": [ "hide-input" ] }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
rownamesminorityagegendercreditsbeautyevaldivisionnativetenurestudentsallstudentsprof
01yes36femalemore0.2899164.3upperyesyes24431
12no59malemore-0.7377324.5upperyesyes17202
23no51malemore-0.5719843.7upperyesyes55553
34no40femalemore-0.6779634.3upperyesyes40464
45no31femalemore1.5097944.4upperyesyes42485
\n", "
" ], "text/plain": [ " rownames minority age gender credits beauty eval division native \\\n", "0 1 yes 36 female more 0.289916 4.3 upper yes \n", "1 2 no 59 male more -0.737732 4.5 upper yes \n", "2 3 no 51 male more -0.571984 3.7 upper yes \n", "3 4 no 40 female more -0.677963 4.3 upper yes \n", "4 5 no 31 female more 1.509794 4.4 upper yes \n", "\n", " tenure students allstudents prof \n", "0 yes 24 43 1 \n", "1 yes 17 20 2 \n", "2 yes 55 55 3 \n", "3 yes 40 46 4 \n", "4 yes 42 48 5 " ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Your answer here\n", "# Read in dataset\n", "profs = pd.read_csv('https://vincentarelbundock.github.io/Rdatasets/csv/AER/TeachingRatings.csv')\n", "profs.head()" ] }, { "cell_type": "markdown", "id": "76eeeca9-97b3-463a-b485-5878d5b0124d", "metadata": { "id": "76eeeca9-97b3-463a-b485-5878d5b0124d" }, "source": [ "### c. The t-test as a marginal effect\n", "We will recreate a t-test with model-based predictions.\n", "\n", "First, conduct a t-test with `pingouin`, comparing the evaluation score of male and female professors." ] }, { "cell_type": "code", "execution_count": 3, "id": "80728e27-f1d7-4b6d-8e87-9d9a7676463d", "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 114 }, "editable": true, "executionInfo": { "elapsed": 8, "status": "ok", "timestamp": 1723926881418, "user": { "displayName": "Alex Jones", "userId": "11094282981700434339" }, "user_tz": -60 }, "id": "80728e27-f1d7-4b6d-8e87-9d9a7676463d", "outputId": "61160ca2-552d-4cbc-86a0-e110873cf56c", "slideshow": { "slide_type": "" }, "tags": [ "hide-input" ] }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Tdofalternativep-valCI95%cohen-dBF10power
T-test-3.266711425.755804two-sided0.001176[-0.27, -0.07]0.30590117.5480.900288
\n", "
" ], "text/plain": [ " T dof alternative p-val CI95% cohen-d \\\n", "T-test -3.266711 425.755804 two-sided 0.001176 [-0.27, -0.07] 0.305901 \n", "\n", " BF10 power \n", "T-test 17.548 0.900288 " ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Your answer here\n", "# T-test with pingouin\n", "pg.ttest(profs.query('gender == \"female\"')['eval'],\n", " profs.query('gender == \"male\"')['eval']\n", " )" ] }, { "cell_type": "markdown", "id": "7af5a6c2-aa6b-4748-a1ed-194884866b78", "metadata": { "id": "7af5a6c2-aa6b-4748-a1ed-194884866b78" }, "source": [ "Now fit a regression model with `statsmodels` predicting evaluations from gender. Call the model `ttest`. Check the summary, and remember the coefficient will equal the mean difference, which we can check our predictions against." ] }, { "cell_type": "code", "execution_count": 4, "id": "69dadb4a-7a67-4d8a-8735-fad9245542c8", "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 253 }, "editable": true, "executionInfo": { "elapsed": 7, "status": "ok", "timestamp": 1723926881418, "user": { "displayName": "Alex Jones", "userId": "11094282981700434339" }, "user_tz": -60 }, "id": "69dadb4a-7a67-4d8a-8735-fad9245542c8", "outputId": "dc7f5e9e-ef3a-46b3-dc75-a56ee5b415dc", "slideshow": { "slide_type": "" }, "tags": [ "hide-input" ] }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
OLS Regression Results
Dep. Variable: eval R-squared: 0.022
Model: OLS Adj. R-squared: 0.020
No. Observations: 463 F-statistic: 10.56
Covariance Type: nonrobust Prob (F-statistic): 0.00124
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
coef std err t P>|t| [0.025 0.975]
Intercept 3.9010 0.039 99.187 0.000 3.824 3.978
gender[T.male] 0.1680 0.052 3.250 0.001 0.066 0.270


Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified." ], "text/latex": [ "\\begin{center}\n", "\\begin{tabular}{lclc}\n", "\\toprule\n", "\\textbf{Dep. Variable:} & eval & \\textbf{ R-squared: } & 0.022 \\\\\n", "\\textbf{Model:} & OLS & \\textbf{ Adj. R-squared: } & 0.020 \\\\\n", "\\textbf{No. Observations:} & 463 & \\textbf{ F-statistic: } & 10.56 \\\\\n", "\\textbf{Covariance Type:} & nonrobust & \\textbf{ Prob (F-statistic):} & 0.00124 \\\\\n", "\\bottomrule\n", "\\end{tabular}\n", "\\begin{tabular}{lcccccc}\n", " & \\textbf{coef} & \\textbf{std err} & \\textbf{t} & \\textbf{P$> |$t$|$} & \\textbf{[0.025} & \\textbf{0.975]} \\\\\n", "\\midrule\n", "\\textbf{Intercept} & 3.9010 & 0.039 & 99.187 & 0.000 & 3.824 & 3.978 \\\\\n", "\\textbf{gender[T.male]} & 0.1680 & 0.052 & 3.250 & 0.001 & 0.066 & 0.270 \\\\\n", "\\bottomrule\n", "\\end{tabular}\n", "%\\caption{OLS Regression Results}\n", "\\end{center}\n", "\n", "Notes: \\newline\n", " [1] Standard Errors assume that the covariance matrix of the errors is correctly specified." ], "text/plain": [ "\n", "\"\"\"\n", " OLS Regression Results \n", "==============================================================================\n", "Dep. Variable: eval R-squared: 0.022\n", "Model: OLS Adj. R-squared: 0.020\n", "No. Observations: 463 F-statistic: 10.56\n", "Covariance Type: nonrobust Prob (F-statistic): 0.00124\n", "==================================================================================\n", " coef std err t P>|t| [0.025 0.975]\n", "----------------------------------------------------------------------------------\n", "Intercept 3.9010 0.039 99.187 0.000 3.824 3.978\n", "gender[T.male] 0.1680 0.052 3.250 0.001 0.066 0.270\n", "==================================================================================\n", "\n", "Notes:\n", "[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.\n", "\"\"\"" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Your answer here\n", "# Fit the model\n", "ttest = smf.ols('eval ~ gender', data=profs).fit()\n", "ttest.summary(slim=True)" ] }, { "cell_type": "markdown", "id": "309098eb-c5fa-4346-8302-2c8449632495", "metadata": { "id": "309098eb-c5fa-4346-8302-2c8449632495" }, "source": [ "Next, use `marginaleffects` to create a datagrid that will give predictions for female and male professors, and pass it to `me.predictions` to make the predictions. Examine the values." ] }, { "cell_type": "code", "execution_count": 5, "id": "e68cdbde-8412-462d-b45e-fcf709251814", "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 213 }, "editable": true, "executionInfo": { "elapsed": 370, "status": "ok", "timestamp": 1723926881781, "user": { "displayName": "Alex Jones", "userId": "11094282981700434339" }, "user_tz": -60 }, "id": "e68cdbde-8412-462d-b45e-fcf709251814", "outputId": "5c953256-76f9-4720-c955-ef3df126a751", "slideshow": { "slide_type": "" }, "tags": [ "hide-input" ] }, "outputs": [ { "data": { "text/html": [ "
\n", "shape: (2, 21)
genderrowidestimatestd_errorstatisticp_values_valueconf_lowconf_highrownamesminorityagecreditsbeautyevaldivisionnativetenurestudentsallstudentsprof
stri32f64f64f64f64f64f64f64i64stri64strf64f64strstrstri64i64i64
"male"04.069030.033548121.2883530.0inf4.0032764.134784305"no"52"more"6.2635e-83.998272"upper""yes""yes"121550
"female"13.9010260.0393399.1874650.0inf3.8239413.978111305"no"52"more"6.2635e-83.998272"upper""yes""yes"121550
" ], "text/plain": [ "shape: (2, 8)\n", "┌────────┬──────────┬───────────┬──────┬─────────┬─────┬──────┬───────┐\n", "│ gender ┆ Estimate ┆ Std.Error ┆ z ┆ P(>|z|) ┆ S ┆ 2.5% ┆ 97.5% │\n", "│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │\n", "│ str ┆ str ┆ str ┆ str ┆ str ┆ str ┆ str ┆ str │\n", "╞════════╪══════════╪═══════════╪══════╪═════════╪═════╪══════╪═══════╡\n", "│ male ┆ 4.07 ┆ 0.0335 ┆ 121 ┆ 0 ┆ inf ┆ 4 ┆ 4.13 │\n", "│ female ┆ 3.9 ┆ 0.0393 ┆ 99.2 ┆ 0 ┆ inf ┆ 3.82 ┆ 3.98 │\n", "└────────┴──────────┴───────────┴──────┴─────────┴─────┴──────┴───────┘\n", "\n", "Columns: gender, rowid, estimate, std_error, statistic, p_value, s_value, conf_low, conf_high, rownames, minority, age, credits, beauty, eval, division, native, tenure, students, allstudents, prof" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Your answer here\n", "# Datagrid\n", "datagrid = me.datagrid(ttest, gender=['male', 'female'])\n", "\n", "# Predictions\n", "preds = me.predictions(ttest, newdata=datagrid)\n", "preds" ] }, { "cell_type": "markdown", "id": "aa66843a-07cf-4a7f-979d-89502406757d", "metadata": { "id": "aa66843a-07cf-4a7f-979d-89502406757d" }, "source": [ "Repeat the predictions step but use the `hypothesis` test to get the difference between the predictions." ] }, { "cell_type": "code", "execution_count": 6, "id": "1d815b2a-3c26-46ff-9066-231784dd9d4c", "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 128 }, "editable": true, "executionInfo": { "elapsed": 8, "status": "ok", "timestamp": 1723926881781, "user": { "displayName": "Alex Jones", "userId": "11094282981700434339" }, "user_tz": -60 }, "id": "1d815b2a-3c26-46ff-9066-231784dd9d4c", "outputId": "dbbf6432-97ab-488e-c701-0648cfd03e3d", "slideshow": { "slide_type": "" }, "tags": [ "hide-input" ] }, "outputs": [ { "data": { "text/html": [ "
\n", "shape: (1, 8)
termestimatestd_errorstatisticp_values_valueconf_lowconf_high
strf64f64f64f64f64f64f64
"Row 1 - Row 2"0.1680040.0516953.2499390.0011549.7587680.0666850.269324
" ], "text/plain": [ "shape: (1, 8)\n", "┌───────────────┬──────────┬───────────┬──────┬─────────┬──────┬────────┬───────┐\n", "│ Term ┆ Estimate ┆ Std.Error ┆ z ┆ P(>|z|) ┆ S ┆ 2.5% ┆ 97.5% │\n", "│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │\n", "│ str ┆ str ┆ str ┆ str ┆ str ┆ str ┆ str ┆ str │\n", "╞═══════════════╪══════════╪═══════════╪══════╪═════════╪══════╪════════╪═══════╡\n", "│ Row 1 - Row 2 ┆ 0.168 ┆ 0.0517 ┆ 3.25 ┆ 0.00115 ┆ 9.76 ┆ 0.0667 ┆ 0.269 │\n", "└───────────────┴──────────┴───────────┴──────┴─────────┴──────┴────────┴───────┘\n", "\n", "Columns: term, estimate, std_error, statistic, p_value, s_value, conf_low, conf_high" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Your answer here\n", "# Comparison is done via hypothesis\n", "me.predictions(ttest, newdata=datagrid, hypothesis='pairwise')" ] }, { "cell_type": "markdown", "id": "7af377ff-fbb7-4597-88ee-756f80eda804", "metadata": { "id": "7af377ff-fbb7-4597-88ee-756f80eda804" }, "source": [ "### d. Carrying out an ANOVA with linear models and marginal effects\n", "Lets now demonstrate how an ANOVA can be executed easily with a linear model and the examination of marginal effects.\n", "\n", "First, use `pinoguin` to carry out an ANOVA on teaching evaluations, using tenure and gender as the factors - that is, examine whether male and female professors differ in their evaluations depending on whether they have achieved tenure or not." ] }, { "cell_type": "code", "execution_count": 7, "id": "ece27b38-a2db-48f8-97c6-e8a636037065", "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 173 }, "editable": true, "executionInfo": { "elapsed": 8, "status": "ok", "timestamp": 1723926881781, "user": { "displayName": "Alex Jones", "userId": "11094282981700434339" }, "user_tz": -60 }, "id": "ece27b38-a2db-48f8-97c6-e8a636037065", "outputId": "79d963c0-ca43-41c0-d431-4d41fcd2bd8b", "slideshow": { "slide_type": "" }, "tags": [ "hide-input" ] }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
SourceSSDFMSFp-uncnp2
0gender3.6289141.03.62891412.6153380.0004220.026749
1tenure2.8293951.02.8293959.8359360.0018210.020979
2gender * tenure4.1879131.04.18791314.5586080.0001540.030743
3Residual132.035435459.00.287659NaNNaNNaN
\n", "
" ], "text/plain": [ " Source SS DF MS F p-unc np2\n", "0 gender 3.628914 1.0 3.628914 12.615338 0.000422 0.026749\n", "1 tenure 2.829395 1.0 2.829395 9.835936 0.001821 0.020979\n", "2 gender * tenure 4.187913 1.0 4.187913 14.558608 0.000154 0.030743\n", "3 Residual 132.035435 459.0 0.287659 NaN NaN NaN" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Your answer here\n", "# A Pingouin ANOVA\n", "pg.anova(data=profs, dv='eval', between=['gender', 'tenure'])" ] }, { "cell_type": "markdown", "id": "b6b860ac-19c3-4e85-8a55-2613a24b9f2a", "metadata": { "id": "b6b860ac-19c3-4e85-8a55-2613a24b9f2a" }, "source": [ "This suggests there is a main effect of gender, tenure and an interaction. Usually we'd need to do post-hoc tests to explore these. But we can rely on marginal effects for a simpler interpretation. First, fit a linear regression that is the same as the ANOVA, predicting evaluation measures from gender, tenure, and its interaction. Call it an `anova_model`." ] }, { "cell_type": "code", "execution_count": 8, "id": "5f87872b-c485-41e2-818b-e52a298b17a2", "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 295 }, "editable": true, "executionInfo": { "elapsed": 7, "status": "ok", "timestamp": 1723926881782, "user": { "displayName": "Alex Jones", "userId": "11094282981700434339" }, "user_tz": -60 }, "id": "5f87872b-c485-41e2-818b-e52a298b17a2", "outputId": "d6700cb9-298f-468c-b937-42fadb421a43", "slideshow": { "slide_type": "" }, "tags": [ "hide-input" ] }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
OLS Regression Results
Dep. Variable: eval R-squared: 0.072
Model: OLS Adj. R-squared: 0.066
No. Observations: 463 F-statistic: 11.82
Covariance Type: nonrobust Prob (F-statistic): 1.80e-07
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
coef std err t P>|t| [0.025 0.975]
Intercept 3.8600 0.076 50.890 0.000 3.711 4.009
gender[T.male] 0.5362 0.106 5.047 0.000 0.327 0.745
tenure[T.yes] 0.0552 0.088 0.627 0.531 -0.118 0.228
gender[T.male]:tenure[T.yes] -0.4610 0.121 -3.816 0.000 -0.699 -0.224


Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified." ], "text/latex": [ "\\begin{center}\n", "\\begin{tabular}{lclc}\n", "\\toprule\n", "\\textbf{Dep. Variable:} & eval & \\textbf{ R-squared: } & 0.072 \\\\\n", "\\textbf{Model:} & OLS & \\textbf{ Adj. R-squared: } & 0.066 \\\\\n", "\\textbf{No. Observations:} & 463 & \\textbf{ F-statistic: } & 11.82 \\\\\n", "\\textbf{Covariance Type:} & nonrobust & \\textbf{ Prob (F-statistic):} & 1.80e-07 \\\\\n", "\\bottomrule\n", "\\end{tabular}\n", "\\begin{tabular}{lcccccc}\n", " & \\textbf{coef} & \\textbf{std err} & \\textbf{t} & \\textbf{P$> |$t$|$} & \\textbf{[0.025} & \\textbf{0.975]} \\\\\n", "\\midrule\n", "\\textbf{Intercept} & 3.8600 & 0.076 & 50.890 & 0.000 & 3.711 & 4.009 \\\\\n", "\\textbf{gender[T.male]} & 0.5362 & 0.106 & 5.047 & 0.000 & 0.327 & 0.745 \\\\\n", "\\textbf{tenure[T.yes]} & 0.0552 & 0.088 & 0.627 & 0.531 & -0.118 & 0.228 \\\\\n", "\\textbf{gender[T.male]:tenure[T.yes]} & -0.4610 & 0.121 & -3.816 & 0.000 & -0.699 & -0.224 \\\\\n", "\\bottomrule\n", "\\end{tabular}\n", "%\\caption{OLS Regression Results}\n", "\\end{center}\n", "\n", "Notes: \\newline\n", " [1] Standard Errors assume that the covariance matrix of the errors is correctly specified." ], "text/plain": [ "\n", "\"\"\"\n", " OLS Regression Results \n", "==============================================================================\n", "Dep. Variable: eval R-squared: 0.072\n", "Model: OLS Adj. R-squared: 0.066\n", "No. Observations: 463 F-statistic: 11.82\n", "Covariance Type: nonrobust Prob (F-statistic): 1.80e-07\n", "================================================================================================\n", " coef std err t P>|t| [0.025 0.975]\n", "------------------------------------------------------------------------------------------------\n", "Intercept 3.8600 0.076 50.890 0.000 3.711 4.009\n", "gender[T.male] 0.5362 0.106 5.047 0.000 0.327 0.745\n", "tenure[T.yes] 0.0552 0.088 0.627 0.531 -0.118 0.228\n", "gender[T.male]:tenure[T.yes] -0.4610 0.121 -3.816 0.000 -0.699 -0.224\n", "================================================================================================\n", "\n", "Notes:\n", "[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.\n", "\"\"\"" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Your answer here\n", "# A linear model equivalent\n", "anova_model = smf.ols('eval ~ gender * tenure', data=profs).fit()\n", "anova_model.summary(slim=True)" ] }, { "cell_type": "markdown", "id": "81931246-3836-4ad2-acc8-2a8001ca2971", "metadata": { "id": "81931246-3836-4ad2-acc8-2a8001ca2971" }, "source": [ "With a fitted model, we can easily explore the implications via the predictions.\n", "\n", "First, make a datagrid that gives predictions for tenure and gender. Call it `anova_predmat`, and then use the model to predict those scores, storing them in a dataframe called `anova_predictions`." ] }, { "cell_type": "code", "execution_count": 9, "id": "cb771ac4-c3cc-4ac6-87f8-b6c46b38172c", "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 309 }, "editable": true, "executionInfo": { "elapsed": 7, "status": "ok", "timestamp": 1723926881782, "user": { "displayName": "Alex Jones", "userId": "11094282981700434339" }, "user_tz": -60 }, "id": "cb771ac4-c3cc-4ac6-87f8-b6c46b38172c", "outputId": "b0babe63-dba5-42b7-83f7-8a0d60fb242c", "slideshow": { "slide_type": "" }, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# Your answer here\n", "# Prediction grid\n", "anova_predmat = me.datagrid(anova_model,\n", " tenure=['yes', 'no'],\n", " gender=['male', 'female'])\n", "\n", "# Output\n", "anova_predictions = me.predictions(anova_model, newdata=anova_predmat)" ] }, { "cell_type": "markdown", "id": "dfbe945e-3969-4513-8cfe-3e9d9279abf8", "metadata": { "id": "dfbe945e-3969-4513-8cfe-3e9d9279abf8" }, "source": [ "It is always sensible to plot predictions before we begin interpretin them. Use `seaborn` to create a line plot that illustrates the interaction. Any way you want is fine - as long as the estimate is on the y axis." ] }, { "cell_type": "code", "execution_count": 10, "id": "71c09c74-134f-4e35-a722-e9232800ce21", "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 159 }, "editable": true, "executionInfo": { "elapsed": 216, "status": "ok", "timestamp": 1723927069461, "user": { "displayName": "Alex Jones", "userId": "11094282981700434339" }, "user_tz": -60 }, "id": "os0rDvsqDUV-", "outputId": "c7b8f145-dd1a-4cc4-8119-ec38128e7322", "slideshow": { "slide_type": "" }, "tags": [ "hide-input" ] }, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Your answer here\n", "# plot\n", "sns.lineplot(data=anova_predictions,\n", " y='estimate', x='tenure',\n", " hue='gender')" ] }, { "cell_type": "markdown", "id": "058ddc5f-eac2-4bc8-88fe-d3593a287224", "metadata": {}, "source": [ "The ANOVA suggested we had the following results:\n", "1. A main effect of gender (differences between men and women, ignoring tenure status)\n", "2. A main effect of tenure (differences between tenured and non-tenured, ignoring gender)\n", "3. An interaction, indicating the difference between one variable (e.g. gender) at one level of the other (say tenured) is different to the other (confusing!)\n", "\n", "Have the model make predictions, and to explore the main effects, use the `by` keyword to ignore one variable and the `hypothesis` keyword to check the differences." ] }, { "cell_type": "code", "execution_count": 11, "id": "748d0f0b-2dec-4771-ab4d-0ad0e44f1e7e", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [ "hide-input" ] }, "outputs": [ { "data": { "text/html": [ "
\n", "shape: (1, 8)
termestimatestd_errorstatisticp_values_valueconf_lowconf_high
strf64f64f64f64f64f64f64
"Row 1 - Row 2"0.1753520.0604172.9023760.0037038.0769180.0569370.293766
" ], "text/plain": [ "shape: (1, 8)\n", "┌───────────────┬──────────┬───────────┬─────┬─────────┬──────┬────────┬───────┐\n", "│ Term ┆ Estimate ┆ Std.Error ┆ z ┆ P(>|z|) ┆ S ┆ 2.5% ┆ 97.5% │\n", "│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │\n", "│ str ┆ str ┆ str ┆ str ┆ str ┆ str ┆ str ┆ str │\n", "╞═══════════════╪══════════╪═══════════╪═════╪═════════╪══════╪════════╪═══════╡\n", "│ Row 1 - Row 2 ┆ 0.175 ┆ 0.0604 ┆ 2.9 ┆ 0.0037 ┆ 8.08 ┆ 0.0569 ┆ 0.294 │\n", "└───────────────┴──────────┴───────────┴─────┴─────────┴──────┴────────┴───────┘\n", "\n", "Columns: term, estimate, std_error, statistic, p_value, s_value, conf_low, conf_high" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Your answer here\n", "# Main effects\n", "# Gender\n", "me.predictions(anova_model, newdata=anova_predmat, by='gender', hypothesis='pairwise')\n", "\n", "# Tenure\n", "me.predictions(anova_model, newdata=anova_predmat, by='tenure', hypothesis='pairwise')" ] }, { "cell_type": "markdown", "id": "39c2d04c-5d51-42eb-a11c-12bf50d0c9a6", "metadata": {}, "source": [ "Now use the predictions to figure out the 'cause' of the interaction. There are a few ways to do this. You can compare men and women professors who are tenured, and see if that difference is significant, and then see if the difference between non-tenured professors is also significant. What do you observe?" ] }, { "cell_type": "code", "execution_count": 12, "id": "50197fde-97b4-4ab9-826d-bdc0d4b5d3b2", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [ "hide-input" ] }, "outputs": [ { "data": { "text/html": [ "
\n", "shape: (2, 10)
gendertermcontrastestimatestd_errorstatisticp_values_valueconf_lowconf_high
strstrstrf64f64f64f64f64f64f64
"female""tenure""mean(yes) - mean(no)"0.0551720.087960.6272410.5305010.914573-0.1172270.227572
"male""tenure""mean(yes) - mean(no)"-0.4058760.082847-4.8990939.6280e-719.98626-0.568254-0.243499
" ], "text/plain": [ "shape: (2, 10)\n", "┌────────┬────────┬──────────────────────┬──────────┬───┬──────────┬───────┬────────┬────────┐\n", "│ gender ┆ Term ┆ Contrast ┆ Estimate ┆ … ┆ P(>|z|) ┆ S ┆ 2.5% ┆ 97.5% │\n", "│ --- ┆ --- ┆ --- ┆ --- ┆ ┆ --- ┆ --- ┆ --- ┆ --- │\n", "│ str ┆ str ┆ str ┆ str ┆ ┆ str ┆ str ┆ str ┆ str │\n", "╞════════╪════════╪══════════════════════╪══════════╪═══╪══════════╪═══════╪════════╪════════╡\n", "│ female ┆ tenure ┆ mean(yes) - mean(no) ┆ 0.0552 ┆ … ┆ 0.531 ┆ 0.915 ┆ -0.117 ┆ 0.228 │\n", "│ male ┆ tenure ┆ mean(yes) - mean(no) ┆ -0.406 ┆ … ┆ 9.63e-07 ┆ 20 ┆ -0.568 ┆ -0.243 │\n", "└────────┴────────┴──────────────────────┴──────────┴───┴──────────┴───────┴────────┴────────┘\n", "\n", "Columns: gender, term, contrast, estimate, std_error, statistic, p_value, s_value, conf_low, conf_high" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Your answer here\n", "me.predictions(anova_model, newdata=anova_predmat, hypothesis='b1=b2') # Tenured, NON significant\n", "me.predictions(anova_model, newdata=anova_predmat, hypothesis='b3=b4') # Non-tenured, significant, males > females\n", "\n", "# Slopes\n", "me.slopes(anova_model, newdata=anova_predmat, variables='tenure', by='gender')" ] }, { "cell_type": "markdown", "id": "4a9d8c72-61ef-47d9-a4ad-a1439fd6bdce", "metadata": {}, "source": [ "### e. ANCOVA done with marginal effects\n", "Let us now add some complexity. ANCOVA is often described as an ANOVA 'adjusting' for another variable. We know it simply as a general linear model, with some kind of categorical predictor, and other continuous predictors that are also in the model. There can be as many categorical predictors and interactions between them as needed, as well as the continuous covariates.\n", "\n", "ANCOVA is a confusing and unnecessary term. Linear models are simpler, and here we will see how. \n", "\n", "First, carry out an ANCOVA with `pingouin` that looks at teaching evaluations between men and women (the categorical predictor), but adjusts for their beauty (the continuous covariate). Print the result. What does it tell you?\n" ] }, { "cell_type": "code", "execution_count": 13, "id": "dd07d46a-4774-4d96-bc93-0f6348da1299", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [ "hide-input" ] }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
SourceSSDFFp-uncnp2
0gender4.346745115.0554900.0001200.031692
1beauty6.243877121.6264440.0000040.044903
2Residual132.808865460NaNNaNNaN
\n", "
" ], "text/plain": [ " Source SS DF F p-unc np2\n", "0 gender 4.346745 1 15.055490 0.000120 0.031692\n", "1 beauty 6.243877 1 21.626444 0.000004 0.044903\n", "2 Residual 132.808865 460 NaN NaN NaN" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Your answer here\n", "# ANCOVA in pingouin\n", "pg.ancova(data=profs, dv='eval', between='gender', covar='beauty')" ] }, { "cell_type": "markdown", "id": "f451aa14-65aa-44b0-bce4-b6cb7f7a1f66", "metadata": {}, "source": [ "You should see that there are significant effects of both gender and beauty, but there's little information of use here. \n", "\n", "Fit a linear model that is equivalent to this ANCOVA, called `ancova_mod`. Print the summary." ] }, { "cell_type": "code", "execution_count": 14, "id": "2dac5a1f-0aeb-489e-aaf2-d59f67b7c76f", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [ "hide-input" ] }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
OLS Regression Results
Dep. Variable: eval R-squared: 0.066
Model: OLS Adj. R-squared: 0.062
No. Observations: 463 F-statistic: 16.33
Covariance Type: nonrobust Prob (F-statistic): 1.41e-07
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
coef std err t P>|t| [0.025 0.975]
Intercept 3.8838 0.039 100.468 0.000 3.808 3.960
gender[T.male] 0.1978 0.051 3.880 0.000 0.098 0.298
beauty 0.1486 0.032 4.650 0.000 0.086 0.211


Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified." ], "text/latex": [ "\\begin{center}\n", "\\begin{tabular}{lclc}\n", "\\toprule\n", "\\textbf{Dep. Variable:} & eval & \\textbf{ R-squared: } & 0.066 \\\\\n", "\\textbf{Model:} & OLS & \\textbf{ Adj. R-squared: } & 0.062 \\\\\n", "\\textbf{No. Observations:} & 463 & \\textbf{ F-statistic: } & 16.33 \\\\\n", "\\textbf{Covariance Type:} & nonrobust & \\textbf{ Prob (F-statistic):} & 1.41e-07 \\\\\n", "\\bottomrule\n", "\\end{tabular}\n", "\\begin{tabular}{lcccccc}\n", " & \\textbf{coef} & \\textbf{std err} & \\textbf{t} & \\textbf{P$> |$t$|$} & \\textbf{[0.025} & \\textbf{0.975]} \\\\\n", "\\midrule\n", "\\textbf{Intercept} & 3.8838 & 0.039 & 100.468 & 0.000 & 3.808 & 3.960 \\\\\n", "\\textbf{gender[T.male]} & 0.1978 & 0.051 & 3.880 & 0.000 & 0.098 & 0.298 \\\\\n", "\\textbf{beauty} & 0.1486 & 0.032 & 4.650 & 0.000 & 0.086 & 0.211 \\\\\n", "\\bottomrule\n", "\\end{tabular}\n", "%\\caption{OLS Regression Results}\n", "\\end{center}\n", "\n", "Notes: \\newline\n", " [1] Standard Errors assume that the covariance matrix of the errors is correctly specified." ], "text/plain": [ "\n", "\"\"\"\n", " OLS Regression Results \n", "==============================================================================\n", "Dep. Variable: eval R-squared: 0.066\n", "Model: OLS Adj. R-squared: 0.062\n", "No. Observations: 463 F-statistic: 16.33\n", "Covariance Type: nonrobust Prob (F-statistic): 1.41e-07\n", "==================================================================================\n", " coef std err t P>|t| [0.025 0.975]\n", "----------------------------------------------------------------------------------\n", "Intercept 3.8838 0.039 100.468 0.000 3.808 3.960\n", "gender[T.male] 0.1978 0.051 3.880 0.000 0.098 0.298\n", "beauty 0.1486 0.032 4.650 0.000 0.086 0.211\n", "==================================================================================\n", "\n", "Notes:\n", "[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.\n", "\"\"\"" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Your answer here\n", "ancova_mod = smf.ols('eval ~ gender + beauty', data=profs).fit()\n", "ancova_mod.summary(slim=True)" ] }, { "cell_type": "markdown", "id": "42ec728f-3e19-4308-89f9-d1bed94dbcd1", "metadata": {}, "source": [ "Now make a prediction about teaching evaluations for females and males. As the variable we want to control for is in the model (beauty), we don't need to make any predictions for it." ] }, { "cell_type": "code", "execution_count": 15, "id": "e5767d2b-27f9-407c-8596-2de46f959801", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [ "hide-input" ] }, "outputs": [ { "data": { "text/html": [ "
\n", "shape: (1, 8)
termestimatestd_errorstatisticp_values_valueconf_lowconf_high
strf64f64f64f64f64f64f64
"Row 1 - Row 2"0.197810.050983.8801410.00010413.2256460.0978910.297729
" ], "text/plain": [ "shape: (1, 8)\n", "┌───────────────┬──────────┬───────────┬──────┬──────────┬──────┬────────┬───────┐\n", "│ Term ┆ Estimate ┆ Std.Error ┆ z ┆ P(>|z|) ┆ S ┆ 2.5% ┆ 97.5% │\n", "│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │\n", "│ str ┆ str ┆ str ┆ str ┆ str ┆ str ┆ str ┆ str │\n", "╞═══════════════╪══════════╪═══════════╪══════╪══════════╪══════╪════════╪═══════╡\n", "│ Row 1 - Row 2 ┆ 0.198 ┆ 0.051 ┆ 3.88 ┆ 0.000104 ┆ 13.2 ┆ 0.0979 ┆ 0.298 │\n", "└───────────────┴──────────┴───────────┴──────┴──────────┴──────┴────────┴───────┘\n", "\n", "Columns: term, estimate, std_error, statistic, p_value, s_value, conf_low, conf_high" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Your answer\n", "# Predictions and contrast\n", "me.predictions(ancova_mod, \n", " hypothesis='pairwise',\n", " newdata=me.datagrid(ancova_mod,\n", " gender=['male', 'female'])\n", " )" ] }, { "cell_type": "markdown", "id": "35bbe0f7-eb84-420c-9c31-f330fcd29063", "metadata": {}, "source": [ "Repeat the analysis without the beauty covariate. Is the difference smaller or larger with the effect of beauty removed?" ] }, { "cell_type": "code", "execution_count": 16, "id": "d58339e8-bd2c-4514-88bf-d2e382e24c14", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [ "hide-input" ] }, "outputs": [ { "data": { "text/html": [ "
\n", "shape: (1, 8)
termestimatestd_errorstatisticp_values_valueconf_lowconf_high
strf64f64f64f64f64f64f64
"Row 1 - Row 2"0.1680040.0516953.2499390.0011549.7587680.0666850.269324
" ], "text/plain": [ "shape: (1, 8)\n", "┌───────────────┬──────────┬───────────┬──────┬─────────┬──────┬────────┬───────┐\n", "│ Term ┆ Estimate ┆ Std.Error ┆ z ┆ P(>|z|) ┆ S ┆ 2.5% ┆ 97.5% │\n", "│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │\n", "│ str ┆ str ┆ str ┆ str ┆ str ┆ str ┆ str ┆ str │\n", "╞═══════════════╪══════════╪═══════════╪══════╪═════════╪══════╪════════╪═══════╡\n", "│ Row 1 - Row 2 ┆ 0.168 ┆ 0.0517 ┆ 3.25 ┆ 0.00115 ┆ 9.76 ┆ 0.0667 ┆ 0.269 │\n", "└───────────────┴──────────┴───────────┴──────┴─────────┴──────┴────────┴───────┘\n", "\n", "Columns: term, estimate, std_error, statistic, p_value, s_value, conf_low, conf_high" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Your answer\n", "# T-test style model\n", "ancova_mod = smf.ols('eval ~ gender', data=profs).fit()\n", "\n", "# Predictions and contrast\n", "me.predictions(ancova_mod, \n", " hypothesis='pairwise',\n", " newdata=me.datagrid(ancova_mod,\n", " gender=['male', 'female'])\n", " )" ] }, { "cell_type": "markdown", "id": "53187ff3-129e-4454-846c-f3684e570af8", "metadata": {}, "source": [ "### f. Knowledge of linear models gets you out of trouble\n", "Following on from the last example, lets say we want to examine the interaction between gender and tenure status and control for beauty. Perhaps we wish to see whether our earlier ANOVA model stands up if we incorporate and control for beauty.\n", "\n", "First, try to fit one of these models in `pingouin`. Its another ANCOVA, but this time has two between factors." ] }, { "cell_type": "code", "execution_count": 17, "id": "6651ee2b-7edf-4569-a2b2-19395540440c", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# Your answer here\n", "#pg.ancova(data=profs, dv='eval', between=['tenure', 'gender'], covar='beauty')" ] }, { "cell_type": "markdown", "id": "fcc7e531-bd48-4731-9d7c-ecc44b052536", "metadata": {}, "source": [ "If you did this correctly, you should see an error - the software doesn't support it!\n", "\n", "But we can easily fit a linear model to do this. Fit a model that has an interaction between gender and tenure, and has beauty as a predictor." ] }, { "cell_type": "code", "execution_count": 18, "id": "c9400fd1-5bec-4f6a-8f31-81f0b14de217", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [ "hide-input" ] }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
OLS Regression Results
Dep. Variable: eval R-squared: 0.104
Model: OLS Adj. R-squared: 0.096
No. Observations: 463 F-statistic: 13.23
Covariance Type: nonrobust Prob (F-statistic): 3.32e-10
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
coef std err t P>|t| [0.025 0.975]
Intercept 3.8804 0.075 51.883 0.000 3.733 4.027
gender[T.male] 0.4890 0.105 4.650 0.000 0.282 0.696
tenure[T.yes] 0.0076 0.087 0.087 0.930 -0.164 0.179
gender[T.male]:tenure[T.yes] -0.3668 0.121 -3.027 0.003 -0.605 -0.129
beauty 0.1289 0.032 4.032 0.000 0.066 0.192


Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified." ], "text/latex": [ "\\begin{center}\n", "\\begin{tabular}{lclc}\n", "\\toprule\n", "\\textbf{Dep. Variable:} & eval & \\textbf{ R-squared: } & 0.104 \\\\\n", "\\textbf{Model:} & OLS & \\textbf{ Adj. R-squared: } & 0.096 \\\\\n", "\\textbf{No. Observations:} & 463 & \\textbf{ F-statistic: } & 13.23 \\\\\n", "\\textbf{Covariance Type:} & nonrobust & \\textbf{ Prob (F-statistic):} & 3.32e-10 \\\\\n", "\\bottomrule\n", "\\end{tabular}\n", "\\begin{tabular}{lcccccc}\n", " & \\textbf{coef} & \\textbf{std err} & \\textbf{t} & \\textbf{P$> |$t$|$} & \\textbf{[0.025} & \\textbf{0.975]} \\\\\n", "\\midrule\n", "\\textbf{Intercept} & 3.8804 & 0.075 & 51.883 & 0.000 & 3.733 & 4.027 \\\\\n", "\\textbf{gender[T.male]} & 0.4890 & 0.105 & 4.650 & 0.000 & 0.282 & 0.696 \\\\\n", "\\textbf{tenure[T.yes]} & 0.0076 & 0.087 & 0.087 & 0.930 & -0.164 & 0.179 \\\\\n", "\\textbf{gender[T.male]:tenure[T.yes]} & -0.3668 & 0.121 & -3.027 & 0.003 & -0.605 & -0.129 \\\\\n", "\\textbf{beauty} & 0.1289 & 0.032 & 4.032 & 0.000 & 0.066 & 0.192 \\\\\n", "\\bottomrule\n", "\\end{tabular}\n", "%\\caption{OLS Regression Results}\n", "\\end{center}\n", "\n", "Notes: \\newline\n", " [1] Standard Errors assume that the covariance matrix of the errors is correctly specified." ], "text/plain": [ "\n", "\"\"\"\n", " OLS Regression Results \n", "==============================================================================\n", "Dep. Variable: eval R-squared: 0.104\n", "Model: OLS Adj. R-squared: 0.096\n", "No. Observations: 463 F-statistic: 13.23\n", "Covariance Type: nonrobust Prob (F-statistic): 3.32e-10\n", "================================================================================================\n", " coef std err t P>|t| [0.025 0.975]\n", "------------------------------------------------------------------------------------------------\n", "Intercept 3.8804 0.075 51.883 0.000 3.733 4.027\n", "gender[T.male] 0.4890 0.105 4.650 0.000 0.282 0.696\n", "tenure[T.yes] 0.0076 0.087 0.087 0.930 -0.164 0.179\n", "gender[T.male]:tenure[T.yes] -0.3668 0.121 -3.027 0.003 -0.605 -0.129\n", "beauty 0.1289 0.032 4.032 0.000 0.066 0.192\n", "================================================================================================\n", "\n", "Notes:\n", "[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.\n", "\"\"\"" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Your answer here\n", "# More complex ANCOVA\n", "ancova2 = smf.ols('eval ~ beauty + gender * tenure', data=profs).fit()\n", "ancova2.summary(slim=True)" ] }, { "cell_type": "markdown", "id": "b201bade-ae5e-48ba-a369-87af55f8fd05", "metadata": {}, "source": [ "Once you have this model, use it to make predictions about gender and tenure as before, and work out the interaction effects. Are they the same as before?" ] }, { "cell_type": "code", "execution_count": 19, "id": "af123534-3d80-42b2-b298-86ca46d7aff9", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [ "hide-input" ] }, "outputs": [ { "data": { "text/html": [ "
\n", "shape: (2, 10)
gendertermcontrastestimatestd_errorstatisticp_values_valueconf_lowconf_high
strstrstrf64f64f64f64f64f64f64
"female""tenure""mean(yes) - mean(no)"0.0076260.0873340.0873150.9304210.104044-0.1635460.178798
"male""tenure""mean(yes) - mean(no)"-0.3591430.082324-4.3625460.00001316.247228-0.520495-0.197791
" ], "text/plain": [ "shape: (2, 10)\n", "┌────────┬────────┬──────────────────────┬──────────┬───┬──────────┬───────┬────────┬────────┐\n", "│ gender ┆ Term ┆ Contrast ┆ Estimate ┆ … ┆ P(>|z|) ┆ S ┆ 2.5% ┆ 97.5% │\n", "│ --- ┆ --- ┆ --- ┆ --- ┆ ┆ --- ┆ --- ┆ --- ┆ --- │\n", "│ str ┆ str ┆ str ┆ str ┆ ┆ str ┆ str ┆ str ┆ str │\n", "╞════════╪════════╪══════════════════════╪══════════╪═══╪══════════╪═══════╪════════╪════════╡\n", "│ female ┆ tenure ┆ mean(yes) - mean(no) ┆ 0.00763 ┆ … ┆ 0.93 ┆ 0.104 ┆ -0.164 ┆ 0.179 │\n", "│ male ┆ tenure ┆ mean(yes) - mean(no) ┆ -0.359 ┆ … ┆ 1.29e-05 ┆ 16.2 ┆ -0.52 ┆ -0.198 │\n", "└────────┴────────┴──────────────────────┴──────────┴───┴──────────┴───────┴────────┴────────┘\n", "\n", "Columns: gender, term, contrast, estimate, std_error, statistic, p_value, s_value, conf_low, conf_high" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Your answer here\n", "# I will use slopes\n", "me.slopes(ancova2, newdata=anova_predmat, variables='tenure', by='gender')" ] }, { "cell_type": "markdown", "id": "de14194d-1311-42a7-85bc-9e8f0e1e229d", "metadata": {}, "source": [ "Plot it the predictions and check them against the ANOVA predictions." ] }, { "cell_type": "code", "execution_count": 20, "id": "d0bbe9dc-ab5b-4160-9cd6-a37db50e70af", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [ "hide-input" ] }, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Your answer here\n", "sns.lineplot(data=me.predictions(ancova2, newdata=anova_predmat),\n", " y='estimate', x='tenure',\n", " hue='gender')" ] }, { "cell_type": "markdown", "id": "96b774ec-7b88-48b5-abb9-4b92d245367c", "metadata": {}, "source": [ "### g. Interpreting complex interactions with marginal effects\n", "If you've completed the above exercises, you've mastered 99% of the statistics used in basic psychology, and learned how to do it from a much clearer perspective. Lets now build knowledge of how to interpret an even more complex model. \n", "\n", "Lets suppose that, rather than controlling for beauty's influence on teaching evaluations for tenured and non-tenured males and females, you want to know whether it influences evaluations at these combinations. That is, you might wish to see how less attractive males are evaluated before and after tenure, and whether this change is different for females, who are typically judged more harshly on their looks. \n", "\n", "To do this, you will need an interaction between gender, beauty, and tenure. Fit a model that does this, and call it `three_interact`. Print the summary." ] }, { "cell_type": "code", "execution_count": 21, "id": "964e9828-d394-4fba-9919-214f51e87a9b", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [ "hide-input" ] }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
OLS Regression Results
Dep. Variable: eval R-squared: 0.110
Model: OLS Adj. R-squared: 0.097
No. Observations: 463 F-statistic: 8.055
Covariance Type: nonrobust Prob (F-statistic): 3.03e-09
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
coef std err t P>|t| [0.025 0.975]
Intercept 3.8601 0.076 51.031 0.000 3.711 4.009
gender[T.male] 0.5076 0.107 4.741 0.000 0.297 0.718
tenure[T.yes] 0.0275 0.088 0.312 0.755 -0.146 0.201
gender[T.male]:tenure[T.yes] -0.3781 0.122 -3.100 0.002 -0.618 -0.138
beauty 0.0006 0.080 0.008 0.994 -0.156 0.157
gender[T.male]:beauty 0.1362 0.124 1.094 0.274 -0.108 0.381
beauty:tenure[T.yes] 0.1301 0.099 1.315 0.189 -0.064 0.325
gender[T.male]:beauty:tenure[T.yes] -0.0934 0.146 -0.640 0.522 -0.380 0.193


Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified." ], "text/latex": [ "\\begin{center}\n", "\\begin{tabular}{lclc}\n", "\\toprule\n", "\\textbf{Dep. Variable:} & eval & \\textbf{ R-squared: } & 0.110 \\\\\n", "\\textbf{Model:} & OLS & \\textbf{ Adj. R-squared: } & 0.097 \\\\\n", "\\textbf{No. Observations:} & 463 & \\textbf{ F-statistic: } & 8.055 \\\\\n", "\\textbf{Covariance Type:} & nonrobust & \\textbf{ Prob (F-statistic):} & 3.03e-09 \\\\\n", "\\bottomrule\n", "\\end{tabular}\n", "\\begin{tabular}{lcccccc}\n", " & \\textbf{coef} & \\textbf{std err} & \\textbf{t} & \\textbf{P$> |$t$|$} & \\textbf{[0.025} & \\textbf{0.975]} \\\\\n", "\\midrule\n", "\\textbf{Intercept} & 3.8601 & 0.076 & 51.031 & 0.000 & 3.711 & 4.009 \\\\\n", "\\textbf{gender[T.male]} & 0.5076 & 0.107 & 4.741 & 0.000 & 0.297 & 0.718 \\\\\n", "\\textbf{tenure[T.yes]} & 0.0275 & 0.088 & 0.312 & 0.755 & -0.146 & 0.201 \\\\\n", "\\textbf{gender[T.male]:tenure[T.yes]} & -0.3781 & 0.122 & -3.100 & 0.002 & -0.618 & -0.138 \\\\\n", "\\textbf{beauty} & 0.0006 & 0.080 & 0.008 & 0.994 & -0.156 & 0.157 \\\\\n", "\\textbf{gender[T.male]:beauty} & 0.1362 & 0.124 & 1.094 & 0.274 & -0.108 & 0.381 \\\\\n", "\\textbf{beauty:tenure[T.yes]} & 0.1301 & 0.099 & 1.315 & 0.189 & -0.064 & 0.325 \\\\\n", "\\textbf{gender[T.male]:beauty:tenure[T.yes]} & -0.0934 & 0.146 & -0.640 & 0.522 & -0.380 & 0.193 \\\\\n", "\\bottomrule\n", "\\end{tabular}\n", "%\\caption{OLS Regression Results}\n", "\\end{center}\n", "\n", "Notes: \\newline\n", " [1] Standard Errors assume that the covariance matrix of the errors is correctly specified." ], "text/plain": [ "\n", "\"\"\"\n", " OLS Regression Results \n", "==============================================================================\n", "Dep. Variable: eval R-squared: 0.110\n", "Model: OLS Adj. R-squared: 0.097\n", "No. Observations: 463 F-statistic: 8.055\n", "Covariance Type: nonrobust Prob (F-statistic): 3.03e-09\n", "=======================================================================================================\n", " coef std err t P>|t| [0.025 0.975]\n", "-------------------------------------------------------------------------------------------------------\n", "Intercept 3.8601 0.076 51.031 0.000 3.711 4.009\n", "gender[T.male] 0.5076 0.107 4.741 0.000 0.297 0.718\n", "tenure[T.yes] 0.0275 0.088 0.312 0.755 -0.146 0.201\n", "gender[T.male]:tenure[T.yes] -0.3781 0.122 -3.100 0.002 -0.618 -0.138\n", "beauty 0.0006 0.080 0.008 0.994 -0.156 0.157\n", "gender[T.male]:beauty 0.1362 0.124 1.094 0.274 -0.108 0.381\n", "beauty:tenure[T.yes] 0.1301 0.099 1.315 0.189 -0.064 0.325\n", "gender[T.male]:beauty:tenure[T.yes] -0.0934 0.146 -0.640 0.522 -0.380 0.193\n", "=======================================================================================================\n", "\n", "Notes:\n", "[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.\n", "\"\"\"" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Your answer here\n", "# three variable interaction\n", "three_interact = smf.ols('eval ~ gender * beauty * tenure', data=profs).fit()\n", "three_interact.summary(slim=True)" ] }, { "cell_type": "markdown", "id": "b07f0191-41fa-47d4-9c4b-788077a41835", "metadata": {}, "source": [ "Confusion abounds looking at the coefficients. Lets make sense of this by asking for predictions from the model. Generate a data grid that asks for beauty to be evaluated at [-2, 0, 2] (that's -2 standard devs below average, average, and 2 above, as the variable is scaled so by the authors), for tenured and non-tenured males and females." ] }, { "cell_type": "code", "execution_count": 22, "id": "ba8978f1-5faf-43c9-a7e7-d8416d1c9481", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [ "hide-input" ] }, "outputs": [ { "data": { "text/html": [ "
\n", "shape: (12, 13)
gendertenurebeautyrownamesminorityagecreditsevaldivisionnativestudentsallstudentsprof
strstri64i64stri64strf64strstri64i64i64
"male""yes"-2243"no"52"more"3.998272"upper""yes"121534
"male""yes"0243"no"52"more"3.998272"upper""yes"121534
"male""yes"2243"no"52"more"3.998272"upper""yes"121534
"male""no"-2243"no"52"more"3.998272"upper""yes"121534
"male""no"0243"no"52"more"3.998272"upper""yes"121534
"female""yes"0243"no"52"more"3.998272"upper""yes"121534
"female""yes"2243"no"52"more"3.998272"upper""yes"121534
"female""no"-2243"no"52"more"3.998272"upper""yes"121534
"female""no"0243"no"52"more"3.998272"upper""yes"121534
"female""no"2243"no"52"more"3.998272"upper""yes"121534
" ], "text/plain": [ "shape: (12, 13)\n", "┌────────┬────────┬────────┬──────────┬───┬────────┬──────────┬─────────────┬──────┐\n", "│ gender ┆ tenure ┆ beauty ┆ rownames ┆ … ┆ native ┆ students ┆ allstudents ┆ prof │\n", "│ --- ┆ --- ┆ --- ┆ --- ┆ ┆ --- ┆ --- ┆ --- ┆ --- │\n", "│ str ┆ str ┆ i64 ┆ i64 ┆ ┆ str ┆ i64 ┆ i64 ┆ i64 │\n", "╞════════╪════════╪════════╪══════════╪═══╪════════╪══════════╪═════════════╪══════╡\n", "│ male ┆ yes ┆ -2 ┆ 243 ┆ … ┆ yes ┆ 12 ┆ 15 ┆ 34 │\n", "│ male ┆ yes ┆ 0 ┆ 243 ┆ … ┆ yes ┆ 12 ┆ 15 ┆ 34 │\n", "│ male ┆ yes ┆ 2 ┆ 243 ┆ … ┆ yes ┆ 12 ┆ 15 ┆ 34 │\n", "│ male ┆ no ┆ -2 ┆ 243 ┆ … ┆ yes ┆ 12 ┆ 15 ┆ 34 │\n", "│ male ┆ no ┆ 0 ┆ 243 ┆ … ┆ yes ┆ 12 ┆ 15 ┆ 34 │\n", "│ … ┆ … ┆ … ┆ … ┆ … ┆ … ┆ … ┆ … ┆ … │\n", "│ female ┆ yes ┆ 0 ┆ 243 ┆ … ┆ yes ┆ 12 ┆ 15 ┆ 34 │\n", "│ female ┆ yes ┆ 2 ┆ 243 ┆ … ┆ yes ┆ 12 ┆ 15 ┆ 34 │\n", "│ female ┆ no ┆ -2 ┆ 243 ┆ … ┆ yes ┆ 12 ┆ 15 ┆ 34 │\n", "│ female ┆ no ┆ 0 ┆ 243 ┆ … ┆ yes ┆ 12 ┆ 15 ┆ 34 │\n", "│ female ┆ no ┆ 2 ┆ 243 ┆ … ┆ yes ┆ 12 ┆ 15 ┆ 34 │\n", "└────────┴────────┴────────┴──────────┴───┴────────┴──────────┴─────────────┴──────┘" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Your answer here\n", "# Get predictions\n", "predmat = me.datagrid(three_interact, \n", " gender=['male', 'female'],\n", " tenure=['yes', 'no'],\n", " beauty=[-2, 0, 2])\n", "\n", "# Show\n", "predmat" ] }, { "cell_type": "markdown", "id": "3b021e15-879a-448c-9dbd-77749685063c", "metadata": {}, "source": [ "Once done, ask the model to predict the outcomes, and plot them. There are now three variables to deal with - so think carefully how you plot them!" ] }, { "cell_type": "code", "execution_count": 23, "id": "dd087ee6-74a8-400e-be05-653aefbf14f8", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [ "hide-input" ] }, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Your answer here\n", "# Predictions\n", "three_preds = me.predictions(three_interact, newdata=predmat)\n", "\n", "# Plot to show interaction pattern overall - lots of ways to do this, e.g.\n", "sns.lineplot(data=three_preds,\n", " x='beauty', \n", " y='estimate',\n", " style='gender',\n", " hue='tenure')\n", "\n", "# Or\n", "sns.relplot(data=three_preds,\n", " x='beauty', y='estimate',\n", " style='tenure', col='gender',\n", " kind='line')\n" ] }, { "cell_type": "markdown", "id": "9602f4ba-6845-414b-96f1-fb59f5e2a9e9", "metadata": {}, "source": [ "If you have visualised it correctly, you should see the general pattern that male evaluations increase with beauty, but they are lower with tenure. Females on the other only show a positive beauty association *with* tenure, and no association without it.\n", "\n", "Now, are those differences meaningful? To test that we need to make a decision about how we want to evaluate our interaction. That depends on the question, since there are many ways to interpret interactions of this complexity.\n", "\n", "Do this in steps. First, is the association between beauty and evaluations different for females and males? " ] }, { "cell_type": "code", "execution_count": 24, "id": "855c8231-82f0-462e-b134-4cbf9e439ec3", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [ "hide-input" ] }, "outputs": [ { "data": { "text/html": [ "
\n", "shape: (1, 8)
termestimatestd_errorstatisticp_values_valueconf_lowconf_high
strf64f64f64f64f64f64f64
"Row 1 - Row 2"-0.089490.072316-1.2374950.2159032.211543-0.2312260.052246
" ], "text/plain": [ "shape: (1, 8)\n", "┌───────────────┬──────────┬───────────┬───────┬─────────┬──────┬────────┬────────┐\n", "│ Term ┆ Estimate ┆ Std.Error ┆ z ┆ P(>|z|) ┆ S ┆ 2.5% ┆ 97.5% │\n", "│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │\n", "│ str ┆ str ┆ str ┆ str ┆ str ┆ str ┆ str ┆ str │\n", "╞═══════════════╪══════════╪═══════════╪═══════╪═════════╪══════╪════════╪════════╡\n", "│ Row 1 - Row 2 ┆ -0.0895 ┆ 0.0723 ┆ -1.24 ┆ 0.216 ┆ 2.21 ┆ -0.231 ┆ 0.0522 │\n", "└───────────────┴──────────┴───────────┴───────┴─────────┴──────┴────────┴────────┘\n", "\n", "Columns: term, estimate, std_error, statistic, p_value, s_value, conf_low, conf_high" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Your answer here\n", "# Slopes are required\n", "me.slopes(three_interact, newdata=predmat, variables='beauty', by=['gender'], hypothesis='pairwise')" ] }, { "cell_type": "markdown", "id": "832f7b9d-ccdc-4c0e-86e1-26b91225bed4", "metadata": {}, "source": [ "Is the association between beauty and evaluations different between tenure status?" ] }, { "cell_type": "code", "execution_count": 25, "id": "3c77d7bd-1951-4be4-8442-6e672b490209", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [ "hide-input" ] }, "outputs": [ { "data": { "text/html": [ "
\n", "shape: (1, 8)
termestimatestd_errorstatisticp_values_valueconf_lowconf_high
strf64f64f64f64f64f64f64
"Row 1 - Row 2"-0.0834010.073588-1.1333510.2570671.959784-0.2276320.060829
" ], "text/plain": [ "shape: (1, 8)\n", "┌───────────────┬──────────┬───────────┬───────┬─────────┬──────┬────────┬────────┐\n", "│ Term ┆ Estimate ┆ Std.Error ┆ z ┆ P(>|z|) ┆ S ┆ 2.5% ┆ 97.5% │\n", "│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │\n", "│ str ┆ str ┆ str ┆ str ┆ str ┆ str ┆ str ┆ str │\n", "╞═══════════════╪══════════╪═══════════╪═══════╪═════════╪══════╪════════╪════════╡\n", "│ Row 1 - Row 2 ┆ -0.0834 ┆ 0.0736 ┆ -1.13 ┆ 0.257 ┆ 1.96 ┆ -0.228 ┆ 0.0608 │\n", "└───────────────┴──────────┴───────────┴───────┴─────────┴──────┴────────┴────────┘\n", "\n", "Columns: term, estimate, std_error, statistic, p_value, s_value, conf_low, conf_high" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Your answer here\n", "me.slopes(three_interact, newdata=predmat, variables='beauty', by=['tenure'], hypothesis='pairwise')" ] }, { "cell_type": "markdown", "id": "fa2d813b-c978-4385-a5a4-69994dfaf283", "metadata": {}, "source": [ "Finally, generate the slopes for all four conditions (male, female, tenured, non-tenured)." ] }, { "cell_type": "code", "execution_count": 26, "id": "7db0289c-a82e-4cef-b532-40fa86c5adab", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [ "hide-input" ] }, "outputs": [ { "data": { "text/html": [ "
\n", "shape: (4, 11)
gendertenuretermcontrastestimatestd_errorstatisticp_values_valueconf_lowconf_high
strstrstrstrf64f64f64f64f64f64f64
"female""no""beauty""mean(dY/dX)"0.0006390.0778960.0082020.9934560.009472-0.1520340.153312
"male""no""beauty""mean(dY/dX)"0.1368520.0956171.4312560.1523572.714474-0.0505530.324257
"female""yes""beauty""mean(dY/dX)"0.1307630.0586092.2310990.0256755.2835150.0158910.245636
"male""yes""beauty""mean(dY/dX)"0.173530.048843.5530720.00038111.3588310.0778070.269254
" ], "text/plain": [ "shape: (4, 11)\n", "┌────────┬────────┬────────┬─────────────┬───┬──────────┬─────────┬─────────┬───────┐\n", "│ gender ┆ tenure ┆ Term ┆ Contrast ┆ … ┆ P(>|z|) ┆ S ┆ 2.5% ┆ 97.5% │\n", "│ --- ┆ --- ┆ --- ┆ --- ┆ ┆ --- ┆ --- ┆ --- ┆ --- │\n", "│ str ┆ str ┆ str ┆ str ┆ ┆ str ┆ str ┆ str ┆ str │\n", "╞════════╪════════╪════════╪═════════════╪═══╪══════════╪═════════╪═════════╪═══════╡\n", "│ female ┆ no ┆ beauty ┆ mean(dY/dX) ┆ … ┆ 0.993 ┆ 0.00947 ┆ -0.152 ┆ 0.153 │\n", "│ male ┆ no ┆ beauty ┆ mean(dY/dX) ┆ … ┆ 0.152 ┆ 2.71 ┆ -0.0506 ┆ 0.324 │\n", "│ female ┆ yes ┆ beauty ┆ mean(dY/dX) ┆ … ┆ 0.0257 ┆ 5.28 ┆ 0.0159 ┆ 0.246 │\n", "│ male ┆ yes ┆ beauty ┆ mean(dY/dX) ┆ … ┆ 0.000381 ┆ 11.4 ┆ 0.0778 ┆ 0.269 │\n", "└────────┴────────┴────────┴─────────────┴───┴──────────┴─────────┴─────────┴───────┘\n", "\n", "Columns: gender, tenure, term, contrast, estimate, std_error, statistic, p_value, s_value, conf_low, conf_high" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Your answer here\n", "me.slopes(three_interact, newdata=predmat, variables='beauty', by=['tenure', 'gender'])" ] }, { "cell_type": "markdown", "id": "0067943d-7b0a-459d-ba62-6201c066ca46", "metadata": {}, "source": [ "Finally, lets ask a sex-specific question of the model. Compare the slopes of females who are tenured and non-tenured here - are they different?" ] }, { "cell_type": "code", "execution_count": 27, "id": "c2423e4c-7d48-4724-931c-60aea03b2b28", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [ "hide-input" ] }, "outputs": [ { "data": { "text/html": [ "
\n", "shape: (1, 8)
termestimatestd_errorstatisticp_values_valueconf_lowconf_high
strf64f64f64f64f64f64f64
"b1=b3"-0.1301240.098895-1.3157820.1882472.409299-0.3239550.063707
" ], "text/plain": [ "shape: (1, 8)\n", "┌───────┬──────────┬───────────┬───────┬─────────┬──────┬────────┬────────┐\n", "│ Term ┆ Estimate ┆ Std.Error ┆ z ┆ P(>|z|) ┆ S ┆ 2.5% ┆ 97.5% │\n", "│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │\n", "│ str ┆ str ┆ str ┆ str ┆ str ┆ str ┆ str ┆ str │\n", "╞═══════╪══════════╪═══════════╪═══════╪═════════╪══════╪════════╪════════╡\n", "│ b1=b3 ┆ -0.13 ┆ 0.0989 ┆ -1.32 ┆ 0.188 ┆ 2.41 ┆ -0.324 ┆ 0.0637 │\n", "└───────┴──────────┴───────────┴───────┴─────────┴──────┴────────┴────────┘\n", "\n", "Columns: term, estimate, std_error, statistic, p_value, s_value, conf_low, conf_high" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Your answer here\n", "me.slopes(three_interact, newdata=predmat, \n", " variables='beauty', by=['tenure', 'gender'],\n", " hypothesis='b1=b3')" ] }, { "cell_type": "markdown", "id": "c56c3cb4-e675-4fab-96a2-ed9a66088530", "metadata": {}, "source": [ "Then compare this for males." ] }, { "cell_type": "code", "execution_count": 28, "id": "aedb97f7-b39a-4c25-b038-913222954b7d", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [ "hide-input" ] }, "outputs": [ { "data": { "text/html": [ "
\n", "shape: (1, 8)
termestimatestd_errorstatisticp_values_valueconf_lowconf_high
strf64f64f64f64f64f64f64
"b2=b4"-0.0366780.107497-0.3412020.7329510.44821-0.2473690.174013
" ], "text/plain": [ "shape: (1, 8)\n", "┌───────┬──────────┬───────────┬────────┬─────────┬───────┬────────┬───────┐\n", "│ Term ┆ Estimate ┆ Std.Error ┆ z ┆ P(>|z|) ┆ S ┆ 2.5% ┆ 97.5% │\n", "│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │\n", "│ str ┆ str ┆ str ┆ str ┆ str ┆ str ┆ str ┆ str │\n", "╞═══════╪══════════╪═══════════╪════════╪═════════╪═══════╪════════╪═══════╡\n", "│ b2=b4 ┆ -0.0367 ┆ 0.107 ┆ -0.341 ┆ 0.733 ┆ 0.448 ┆ -0.247 ┆ 0.174 │\n", "└───────┴──────────┴───────────┴────────┴─────────┴───────┴────────┴───────┘\n", "\n", "Columns: term, estimate, std_error, statistic, p_value, s_value, conf_low, conf_high" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Your answer here\n", "me.slopes(three_interact, newdata=predmat, \n", " variables='beauty', by=['tenure', 'gender'],\n", " hypothesis='b2=b4')" ] }, { "cell_type": "markdown", "id": "d068eddf-ee94-4897-9afb-8d049b32dfbd", "metadata": {}, "source": [ "What effect does beauty have on teaching evaluations?" ] }, { "cell_type": "markdown", "id": "df7b9fe3-1364-4953-a4ba-6f08cb992155", "metadata": {}, "source": [ "### h. Personalised predictions\n", "For the final exercise, we will work to understand how to use models to make individual predictions. \n", "\n", "We'll use a new dataset, recorded by Daniel Hamermesh, an economist interested in the effects of physical attractiveness in the labour market (e.g. the halo effect). You can find this dataset (called `beauty` by the authors), here: https://vincentarelbundock.github.io/Rdatasets/csv/wooldridge/beauty.csv\n", "\n", "Also, you can see a description of these predictors here: https://vincentarelbundock.github.io/Rdatasets/doc/wooldridge/beauty.html\n", "Take a look!\n", "\n", "Read it into a dataframe called `beauty`, and print the first 5 rows." ] }, { "cell_type": "code", "execution_count": 29, "id": "c1383176-e4cc-4910-bf43-b4f503b656a9", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [ "hide-input" ] }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
rownameswagelwagebelavgabvavgexperlooksuniongoodhlthblackfemalemarriedsouthbigcitysmllcityserviceexpersqeduc
015.731.7457150130401011001190014
124.281.4539530028301011101078412
237.962.07442901354010100010122510
3411.572.44841600383010010101144416
4511.422.4353660027301001001072916
\n", "
" ], "text/plain": [ " rownames wage lwage belavg abvavg exper looks union goodhlth \\\n", "0 1 5.73 1.745715 0 1 30 4 0 1 \n", "1 2 4.28 1.453953 0 0 28 3 0 1 \n", "2 3 7.96 2.074429 0 1 35 4 0 1 \n", "3 4 11.57 2.448416 0 0 38 3 0 1 \n", "4 5 11.42 2.435366 0 0 27 3 0 1 \n", "\n", " black female married south bigcity smllcity service expersq educ \n", "0 0 1 1 0 0 1 1 900 14 \n", "1 0 1 1 1 0 1 0 784 12 \n", "2 0 1 0 0 0 1 0 1225 10 \n", "3 0 0 1 0 1 0 1 1444 16 \n", "4 0 0 1 0 0 1 0 729 16 " ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Your answer here\n", "beauty = pd.read_csv('https://vincentarelbundock.github.io/Rdatasets/csv/wooldridge/beauty.csv')\n", "beauty.head()" ] }, { "cell_type": "markdown", "id": "b8ef23bd-bfae-41e4-a963-2a8f8e894fe7", "metadata": {}, "source": [ "We will aim to build a model that predicts wages/income from a series of predictors, one of which includes `looks`, which runs from 1 - 5 (5 being very attractive). Specifically, we'll use:\n", "\n", "- Looks\n", "- years of workforce experience\n", "- union member status\n", "- health status\n", "- ethnicity (here only black vs other)\n", "- sex\n", "- years of education\n", "\n", "Using the description in the above link, build a linear regression model that predicts wages from those variables." ] }, { "cell_type": "code", "execution_count": 30, "id": "39b2f115-bd8a-4f19-be3a-6baf1fc31fb9", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [ "hide-input" ] }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
OLS Regression Results
Dep. Variable: wage R-squared: 0.196
Model: OLS Adj. R-squared: 0.192
No. Observations: 1260 F-statistic: 43.61
Covariance Type: nonrobust Prob (F-statistic): 2.48e-55
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
coef std err t P>|t| [0.025 0.975]
Intercept -1.4917 0.952 -1.567 0.117 -3.359 0.376
looks 0.3944 0.176 2.237 0.025 0.049 0.740
female -2.4942 0.260 -9.603 0.000 -3.004 -1.985
exper 0.0860 0.011 8.145 0.000 0.065 0.107
union 0.7966 0.268 2.969 0.003 0.270 1.323
goodhlth -0.0606 0.481 -0.126 0.900 -1.004 0.882
black 0.0028 0.460 0.006 0.995 -0.899 0.904
educ 0.4520 0.047 9.630 0.000 0.360 0.544


Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified." ], "text/latex": [ "\\begin{center}\n", "\\begin{tabular}{lclc}\n", "\\toprule\n", "\\textbf{Dep. Variable:} & wage & \\textbf{ R-squared: } & 0.196 \\\\\n", "\\textbf{Model:} & OLS & \\textbf{ Adj. R-squared: } & 0.192 \\\\\n", "\\textbf{No. Observations:} & 1260 & \\textbf{ F-statistic: } & 43.61 \\\\\n", "\\textbf{Covariance Type:} & nonrobust & \\textbf{ Prob (F-statistic):} & 2.48e-55 \\\\\n", "\\bottomrule\n", "\\end{tabular}\n", "\\begin{tabular}{lcccccc}\n", " & \\textbf{coef} & \\textbf{std err} & \\textbf{t} & \\textbf{P$> |$t$|$} & \\textbf{[0.025} & \\textbf{0.975]} \\\\\n", "\\midrule\n", "\\textbf{Intercept} & -1.4917 & 0.952 & -1.567 & 0.117 & -3.359 & 0.376 \\\\\n", "\\textbf{looks} & 0.3944 & 0.176 & 2.237 & 0.025 & 0.049 & 0.740 \\\\\n", "\\textbf{female} & -2.4942 & 0.260 & -9.603 & 0.000 & -3.004 & -1.985 \\\\\n", "\\textbf{exper} & 0.0860 & 0.011 & 8.145 & 0.000 & 0.065 & 0.107 \\\\\n", "\\textbf{union} & 0.7966 & 0.268 & 2.969 & 0.003 & 0.270 & 1.323 \\\\\n", "\\textbf{goodhlth} & -0.0606 & 0.481 & -0.126 & 0.900 & -1.004 & 0.882 \\\\\n", "\\textbf{black} & 0.0028 & 0.460 & 0.006 & 0.995 & -0.899 & 0.904 \\\\\n", "\\textbf{educ} & 0.4520 & 0.047 & 9.630 & 0.000 & 0.360 & 0.544 \\\\\n", "\\bottomrule\n", "\\end{tabular}\n", "%\\caption{OLS Regression Results}\n", "\\end{center}\n", "\n", "Notes: \\newline\n", " [1] Standard Errors assume that the covariance matrix of the errors is correctly specified." ], "text/plain": [ "\n", "\"\"\"\n", " OLS Regression Results \n", "==============================================================================\n", "Dep. Variable: wage R-squared: 0.196\n", "Model: OLS Adj. R-squared: 0.192\n", "No. Observations: 1260 F-statistic: 43.61\n", "Covariance Type: nonrobust Prob (F-statistic): 2.48e-55\n", "==============================================================================\n", " coef std err t P>|t| [0.025 0.975]\n", "------------------------------------------------------------------------------\n", "Intercept -1.4917 0.952 -1.567 0.117 -3.359 0.376\n", "looks 0.3944 0.176 2.237 0.025 0.049 0.740\n", "female -2.4942 0.260 -9.603 0.000 -3.004 -1.985\n", "exper 0.0860 0.011 8.145 0.000 0.065 0.107\n", "union 0.7966 0.268 2.969 0.003 0.270 1.323\n", "goodhlth -0.0606 0.481 -0.126 0.900 -1.004 0.882\n", "black 0.0028 0.460 0.006 0.995 -0.899 0.904\n", "educ 0.4520 0.047 9.630 0.000 0.360 0.544\n", "==============================================================================\n", "\n", "Notes:\n", "[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.\n", "\"\"\"" ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Your answer here\n", "beauty_mod = smf.ols('wage ~ looks + female + exper + union + goodhlth + black + educ', data=beauty).fit()\n", "beauty_mod.summary(slim=True)" ] }, { "cell_type": "markdown", "id": "53fa7f84-2f61-4bd2-84ed-8075f66b7443", "metadata": {}, "source": [ "Despite the number of variables, with some effort we could interpret these coefficients. For example, we might say that as your looks are rated one point higher on the scale, your hourly wage increases by .39 cents, all else being equal. But it becomes more difficult when trying to interpret the implications across multiple predictors. \n", "\n", "Use the model to generate predictions for two individuals, one male and one female. The female should have 17 years of education and the male 12 (about average). \n", "\n", "What's the difference in their wages? Is it statistically significant?" ] }, { "cell_type": "code", "execution_count": 31, "id": "8b2ec684-b08f-4432-ac58-4d0d1b1c4964", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [ "hide-input" ] }, "outputs": [ { "data": { "text/html": [ "
\n", "shape: (1, 8)
termestimatestd_errorstatisticp_values_valueconf_lowconf_high
strf64f64f64f64f64f64f64
"b1=b4"-0.2342650.353636-0.6624460.5076850.977993-0.9273780.458849
" ], "text/plain": [ "shape: (1, 8)\n", "┌───────┬──────────┬───────────┬────────┬─────────┬───────┬────────┬───────┐\n", "│ Term ┆ Estimate ┆ Std.Error ┆ z ┆ P(>|z|) ┆ S ┆ 2.5% ┆ 97.5% │\n", "│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │\n", "│ str ┆ str ┆ str ┆ str ┆ str ┆ str ┆ str ┆ str │\n", "╞═══════╪══════════╪═══════════╪════════╪═════════╪═══════╪════════╪═══════╡\n", "│ b1=b4 ┆ -0.234 ┆ 0.354 ┆ -0.662 ┆ 0.508 ┆ 0.978 ┆ -0.927 ┆ 0.459 │\n", "└───────┴──────────┴───────────┴────────┴─────────┴───────┴────────┴───────┘\n", "\n", "Columns: term, estimate, std_error, statistic, p_value, s_value, conf_low, conf_high" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Your answer here\n", "me.predictions(beauty_mod,\n", " newdata=me.datagrid(beauty_mod, \n", " female=[1, 0],\n", " educ=[17, 12]),\n", " hypothesis='b1=b4'\n", " )" ] }, { "cell_type": "markdown", "id": "fb64a26c-e8a0-40f1-bd05-9992c82aca9a", "metadata": {}, "source": [ "Consider next the implications of looks on earnings. \n", "\n", "What's the difference in earnings for a male with 5 years work experience, 15 years of education, who has a 5 in terms of looks, and a counterpart who has the same values but has a 3 on looks?" ] }, { "cell_type": "code", "execution_count": 32, "id": "e172e5ae-4873-49bb-a41b-93b6351d64ca", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [ "hide-input" ] }, "outputs": [ { "data": { "text/html": [ "
\n", "shape: (2, 26)
femaleexpereduclooksrowidestimatestd_errorstatisticp_values_valueconf_lowconf_highrownameswagelwagebelavgabvavguniongoodhlthblackmarriedsouthbigcitysmllcityserviceexpersq
i64i64i64i64i32f64f64f64f64f64f64f64i64f64f64i64i64i64i64i64i64i64i64i64i64i64
0515306.8407910.24758227.630430.0inf6.355547.3260423906.306691.65880001010000100
0515517.6295430.37108420.5601320.0inf6.9022318.3568553906.306691.65880001010000100
" ], "text/plain": [ "shape: (2, 11)\n", "┌────────┬───────┬──────┬───────┬───┬─────────┬─────┬──────┬───────┐\n", "│ female ┆ exper ┆ educ ┆ looks ┆ … ┆ P(>|z|) ┆ S ┆ 2.5% ┆ 97.5% │\n", "│ --- ┆ --- ┆ --- ┆ --- ┆ ┆ --- ┆ --- ┆ --- ┆ --- │\n", "│ str ┆ str ┆ str ┆ str ┆ ┆ str ┆ str ┆ str ┆ str │\n", "╞════════╪═══════╪══════╪═══════╪═══╪═════════╪═════╪══════╪═══════╡\n", "│ 0 ┆ 5 ┆ 15 ┆ 3 ┆ … ┆ 0 ┆ inf ┆ 6.36 ┆ 7.33 │\n", "│ 0 ┆ 5 ┆ 15 ┆ 5 ┆ … ┆ 0 ┆ inf ┆ 6.9 ┆ 8.36 │\n", "└────────┴───────┴──────┴───────┴───┴─────────┴─────┴──────┴───────┘\n", "\n", "Columns: female, exper, educ, looks, rowid, estimate, std_error, statistic, p_value, s_value, conf_low, conf_high, rownames, wage, lwage, belavg, abvavg, union, goodhlth, black, married, south, bigcity, smllcity, service, expersq" ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Your answer here\n", "me.predictions(beauty_mod,\n", " newdata=me.datagrid(beauty_mod, \n", " female=[0],\n", " exper=[5],\n", " educ=[15],\n", " looks=[3, 5])\n", " )" ] }, { "cell_type": "markdown", "id": "add3e964-a56e-4fc7-a3d5-23ebde3273b8", "metadata": {}, "source": [ "If you have mastered the above exercise, then you have a solid understanding of how most of the modern world works - for example, insurance companies use this exact approach to determine your premiums! As ever, these inferences are only as good as the model and the data. " ] } ], "metadata": { "colab": { "provenance": [] }, "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.8" } }, "nbformat": 4, "nbformat_minor": 5 }